IXXrmiEIIT RESUME 



ED 078 005 



TM 0C2 817 



ADTHOB 
TITLE 

PUB DATE 
NOTE 



EDHS PRICE 
DESCRIPTORS 



Bruno, Nancy L« 

Implementation Procedures for Statewide 

Assessment* 

Feb 73 

15p,; Paper presented at National Council for 
Measurement in Education Meeting (New Orleans, 
Louisiana, February 26-28, 1973) 

MF-$0,65 HC-S3,29 

Data Analysis; Educational objectives; ♦Evaluation 
Methods; Evaluation Techniques; Measurement 
Techniques; ^Program Evaluation; Speeches; ♦state 
Programs; Student Evaluaticm 



ABSTRACT ^ 

Six principles for statewide assessment are 
discussed: (1) involve the community; (2) specify and define goals; 
(3) use measuring devices with face and content validity; (4) take 
noncognitive effects of school into account; (5) design data 
presentaticm for lay understanding; and (6) do not let assessment be 
an end in itself. Decisions to be made in planning and conducting a 
statewide assessment program are also discussed. They involve: goal 
setting, establishing priorities, the number of goals to be assessed, 
the target populaticm, sampling procedures, instrumentation, 
correlates of achievement, data analysis, reporting of results, and 
when to conduct the assessment. (KM) 
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PRINCIPLES OF ASSESSMENT 



Although the title of this paper is listed in the program as "Imple* 
mentation Procedures for Statewide Assessment" a more appropriate title 
probably is "Principles, Pitfalls and Strategies in Statewide Assessment." 
The personal experiences of the staff at the Center for Statewide Educa-* 
tional Assessment and our knowledge of various state programs led to the 
formulation of six assessment principles. These were designed as a guide 
for state department personnel and others to assist them in optimizing 
their chances for a successful assessment program. Although they have 
been published elsewhere » I feel so strongly about their underlying Impor* 
tance in any assessment program, I am going to repeat them here. 

Involve the community . Effective educational assessment demands the 
recognition and involvement of the entire community, i.e., legislators, 
educators, parents, students, business managers, labor leaders and other 
concerned groups. 

Although this can be done at several points in the program, ideally 
the community should be involved to some degree at the very beginning. 

One method of involving them is to have representatives from each 
group assist in determining what the goals for education ought to be. 
Since each group may have different priorities this could be a time consum-' 
ing activity. The time will be well spent, however, since in addition to 
determining the goals, the participants should also become aware of each 
other's needs and constraints. 

For example, the legislator primarily wants to know how much pupil 
learning and development the money he appropriates for education is buying. 



He must also answer to his constituents who may not reelect him if they £eel 
he is not concerned about the quality of their children's education. 

Teachers have a great interest in assessment since they are directly 
concerned with education. Some may have negative attitudes toward it i£ 
they feel they personally will be evaluated solely on the basis of assessment 
results for their students. In addition to the valuable contribution they 
can make» they will be less apt to feel threatened because they have been 
given the opportunity to participate in the developmental phases of the 
program. 

Students should certainly have representation in deciding what the 
goals for education ought to be» since those goals most directly concern them 
and their future^ 

Parents want assurance that their children are receiving the kind of 
education that will enable them to cope with the ever increasing complexity 
of the world in which they live. 

The amount and type of ccxmnunlty Involvement in the assessment program 
will depend » to a great extent » on the time constraints within which the 
program must operate. If there is sufficient time, it would be advisable to 
have a series of regional meetings with representative groups. If pressed 
for tlme» the most efficient way to Involve the community is to select an 
advisory committee. The members of the committee should be selected so that 
all the concerned groups are represented and allowed to contribute. If they 
are used merely as a rubber stamp for a fait accompli » the purposes of 
community involvement are defeated. 

Meaningful early Involvement of the various Interest groups should 
facilitate understanding and cooperation when the assessment is conducted. 



specify and define goals . After the broad goals have been Identified 
and accepted, they must be defined operationally ar.'^ behavj'orally so they 
can be measured. The community should continue to be consulted during this 
phase; especially the educators. 

An example of the need for this type of definition Is the goal "To appre- 
ciate human endeavor In the arts." As stated, It Is too broad to measure. 
To Illustrate, one facet of this goal might be to demonstrate an appreciation 
of music. An apprcsclatlon of music could be defined behavlorally as the 
number of times tapes, records and music books are used. This definition 
corresponds to the receiving and responding levels In the taxonomy of the 
affective domain (Krathwohl, et al, 1964). The behavioral objective could 
then be measured by a frequency count of the tapes, records and music books 
used In the library and those taken off campus for listening and reading. 
The' number of usages and the proportion of students Involved would be an 
Indicator of the student body's appreciation of music. 

Measuring devices must have face and content validity . The Instruments 
used In the assessment program, whether selected from existing tests or 
constructed specifically for the program, should contain an adequate sampling 
of the specified universe of content. In addition, they should be face valid. 
I.e., the layman must be able to look at the Instruments and see the relation-* 
ships between them and the goals being measured. For example. If the objective 
Is to measure understanding and the Instrument contains Items that are purely 
factual In content, the Instrument would not have content validity although 
It might appear to be face valid. Adequate assessment devices must present 
both. 

Take noncognltlve effects of school Into account . Society, for many 
reasons, Is delegating more and more responsibility to the schools for 
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developing learning outcomes which are not skills centered. The appreci- 
ation o£ human endeavor in the arts mentioiied earlier is one example. 
Another is the development o£ a positive self concept. Although these non- 
cognitive areas are admittedly more difficult to measure and interpret, 
they must not be ignored in the early phase of an assessment program or they 
most likely will continue to be neglected as the program is enlarged. A 
^ second reason for including noncognitive measures early is that people tend 

to concentrate efforts on the areas being evaluated. Therefore, failure to 
evaluate noncognitive areae has the effect of focusing the educational proc- 
ess on the skill development segment of education to the neglect of the 
equally important noncognitive areas. 

Data presentation should be designed for lay understanding . Possibly 
the most crucial aspect in determining the success or failure of an assess- 
ment program is the reporting of results. The reports should be in terms 
that are comprehensible to the layman. Interpretation of statistical data, 
particularly that which requires qualification, such as test scores, is 
most effective when interaction between the receiver and presenter is 
possible. However, there is likely to be little interaction if the results 
are reported in sophisticated technical terms. Possible alternatives for 
use in the presentation of data are: expectancy tables based on previous 
yearns performance; comparison with state norms; percentage of response to 
each option of the key items; description of the distribution of student 
scores in terms of the kinds of items which they can handle successfully and 
those which present difficulty; and in relation to attainment of the goals. 
The method selected for reporting results will depend on several factors. 
Among these are the uses which will be made of the assessment data, who will 
use the results and the type of Instruments ased. 
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Assessment must not be an end In itself ^ The last principle, which 
perhaps should have been first, is that assessment must be clearly identified 
as one component of the total educational process. Evaluative data ate 
collected to meet specific needs and if assessment is not related to these 
purposes it is useless. Assessment must provide decision makers at various 
levels with information that will enable them to make program modifications 
necessary for educational improvement. For example, high and low scoring 
schools should be observed to determ5.ne the activities, materials, etc. 
that seem to be making a difference in the student output so that these may 
be tried out in other schools. 



PITFALLS AND STRATEGIES 



Assessment, like education Itself, is a dynamic process and may be 
initiated from different points depending on the constraints under which 
assessment is conducted. If a state has a legislative mandate to aasefis 
specified areas for instance, it would enter the system at a different 
point than one that had no constraints. A state with foncalized educa- 
tional goals would begin -at a different point than one with only implicit 
goals. 

Before Implementing an assessment program there are many questions 
that must be ansvyered. In this paper assessment decisions are treated in 
a linear fashion although in reality they are nonlinear and interdependent. 

The "umbrella" question for the entire assessment program is "Why 
asses'??" The most defensible reason for conducting an assessment is to 
determine the status of education thereby providing decision makers with 
information that will allow them to examine programs that are succeeding 
to determine their generalizability and applicability to other locations 
and student populations. 

Although there are educators and others who say that It is not neces- 
sary to determine explicitly stated goals before assessment, in several 
states where assessments have been conducted without them, there has been 
a great deal of resistance to the program. On the other hand, programs in 
some states that have devoted a great deal of time to determining the goals 
and then assessed only the basic skills, have also met resistance and even 
hostility. It may be that having or not having goal statements had nothing 
to do with the success or failure of these programs, but were instead 



indicative of poor communication among the various groups concerned with 
aasessment. In any event » It would seem that the process of community 
involvement in deriving goal statements would enhance a program's chances 
for success by eliminating unknown factors. 

Once there is a set of formal goals, the next step is to determine 
their priorities. If the community was not involved formally in stating 
the goals, it should certainly be involved at this stage — especially the 
educators. This can be accomplished in several ways. Provided there is 
sufficient time, it is best to conduct a series of regional meetings with 
representatives of the various Interest groups. A second alternative is 
to select a cro8S**sectional advisory committee whose members represent the 
different publics concerned with and affected by assessment. The Delphi 
technique, originally used in military forecasting, is a means of involving 
the community without the personal contacts that occur in regional meetings. 
Georgia is one state that has used this technique with apparent success. 
Each participant checkrates lists of statements and writes his comments. 
Successive mailings with the results of the previous evaluation are used 
to reach consensus. 

After priorities have been established, decisions must be made with 
regard to how many of the goals will be assessed. Financial constraints may 
limit the number, but it is reconmended that at least one noncognitive goal 
be included In the first assessment. Attitude toward learning and attitude 
toward self would be likely candidates for inclusion since they are related 
to the cognitive areas and there are measuring devices available. 

The question of who will be assessed has two facets; the grades or 
ages of those to be tested and whether there will be universal testing or 
sampling. 



Once again these decisions depend on others. For example » if National 
Assessment exercises are choeeny they are only appropriate for certain ages 
or grades. In general » states with operational assessment programs are 
assessing at least one elementary and one secondary grade. This decision 
is usually arbitrary or expedient. The elementary grades 3^ A and 5 are 
frequently selected^ Some of the reasons for choosing fhose grades are: 
It is the year no other tests are given; it is the year a test that is part 
of the assessment package is given; tests are available for that grade; or 
there is time to make program changes that will benefit these children. 
Testing at the secondary level ranges from grade 7 to grade 12. At this 
level the problem of dropout and college admissions testing may influence 
the grade selection. The selection is also Influenced by whether the data 
will be used to improve programs before the students leave school or to 
assess the overall performance of the schools. 

Following the selection of the population » the decision must be made 
whether to test all students or to sample. If the results are to be used 
for diagnostic purposes » all students should be tested and instrtments must 
be long enough to provide reliable results within the diagnostic categories. 
To provide a statewide picture of education a sample is sufficient. Because 
of the complexity of sampling procedures » it is recommended that a sampling 
expert be consulted before making the final decision. Once the lower unit 
price for the increased volume in testing all students and the cost of draw* 
ing a sample are computed ^ It may be financially as economical to test every- 
one in a given grade as to £ ample. 

The political climate m«iy be such that the program would suffer a loss 
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in credibility if sampling were used. For example » even though politicians 
rely on sample based polls to predict their elections » they never trust the 



"other guy^s" poll. Other problems with sanpllng are the Increased 
difficulty in interpreting results and the difficulty in explaining to 
parents how the results are representative of their school district or 
building When their child was not included in the sample. 

Even when sampling is used, large: amounts of student time are still 
required for the assessment. One way to decrease the amount of student 
time is to use matrix sampling in vhich both students and items are 
sampled. In considering this technique however, the Increased costs for 
printing, administration and analysis must be weighed against the time 
saved to determine its feasibility. It should be * ftit:r:rated that matrix 
sampling may allow district or building descriptions, ! it for any unit 
as small as a classroom, sampling will probably not produce sufficiently 
reliable results. 

The next probi&'m is the selection of measuring devices. The alternatives 
here are many and the choice depends on the reasons for the assessment and 
the useiB to be made of the results. 

When one of the purposes i& to compare the state to a regional or 
national group, some type of test for vhich there is normative data mnst 
be used. Both standardized achievement tests and the National Assessment 
exercises permit these kinds of comparisons. 

If there is no requirement to make comparisons additional possibilities 
are: (1) construct new instruments locally; (2) contract with a test pub- 
lisher to construct instruments to specifications; (3) aggregate the data 
from testa routinely administered by school districts. This last alternative 
will become more feasible when the Office of Education releases the data 
from the National Test-Equating Study in Reading. This study equated scores 
for the seven reading comprehension and vocabulary tests most widely used 
for children in grades 4, 5 and 6. 



When Intrastate conparlaona are to be nade from assessMnt results it 
is essential to collect InforMtlon concerning the conditions of learning; 
that m^i related to achievement. It Is uninformative uid misleading to 
compare institutions from a wealthy .^urburban district with those from a 
poor rural or inner-city district without considering the variables that 
have been found to be associated with achievement. If compared solely on the 
basis of test results » the poorer districts would appear to be doing less for 
their st*:i lent8. However » when student background characteristics » teacher 
characteristics and financial rescurces available are also considered » the 
poorer districts could be making a more ef flcianl: use of their resources to 
iiq>rove student performance than the wealthier districts. 

Probably the most important reason for examining condition variables 
is to provide areas for hypothesis generation about ^he causes of learning 
success or failure. Socioeconomic status variables h«7e consistently been 
found to be related to school achievement. This does uot mean that being 
poor or wealthy determines a student's achievement » but it does indicate that 
there may be experiences available in communities of differing SES strata 
that account for the differing achievement results. Preliminary reports from 
a state assessment follow up stuiy of high and low scoring schools indicate 
that teachers tend to int^.ract tho. same with all groups when it is doubtful 
that the same approach Is effective for all groups* In addition , unless the 
achievement related variables are exai^>ir4ad» there Is no way of knowing whether 
previously found relationships are true for a particular ctate or district in 
that state. 

A paper by Campbell (in press) reviews the us^^s of correlates of achieve- 
ment and presents some procedures tor examining what actually occurs in 
classrooms that might account for differences on achieveii:3nt measures. 
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Although these procedures could probably not be used on a statewide basis 
because o£ the cost and time, the state could provide financial and tech- 
nical assistance to local districts to carry out intensive local assessments. 
The state could also serve as a cle^iinghouse to distribute the firdings 
from the local studies. 

Once all of the data are collected, they must be analyzed. The kinds 
of analyses will be determined primarily by the types of data collected 
and the purposes of the assessment. Frequency distributions > a measure of 
central tendency and a measure of variability will provide a description of 
the position of the schools in the state and permit comparison with a noim 
group. 

When data on condition variables have been collected, but not quantified, 
meaningful analyses are the two-way factorial analysis of variance and the 
Friedman two-way analysis for ranked data. These analyses reveal inter- 
relationships which should be examined further. 

If the data collected on condition variables are quantifiable, a factor 
analysis will determine clusters that may yield meaningful interpretations 
in terms of educational Implications. 

Multiple correlation procedures allow more complex relationships to be 
considered. They provide a method for examining the iinique contribution of 
many variables in a systematic way. Again, it must be remembered that results 
obtained from these analyses do not indicate any causation. From these results, 
variables can be identified which should be studied further to determine 
possible influences on students' learning esqperiences. The next step is to 
conduct intensive examination of specific learning environments and programs 
to determine variables that may be making a difference. The Intensive studies 
of i:hi8 type should provide information which will enable changes to be made. 
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The results of these changes in the program or environment can then be 
evaluated. 

Perhaps the most critical phase of an assessment program is the 
reporting of results. Interpretation of statistical data is probably the 
most difficult part of an assessment program for many educators and for 
most of the community at large. 

The ideal method of reporting results is where interaction between the 
receiver and presenter is possible. This personal cortact is especially 
Important when assessment results are reported to legislators , state board 
of education, the governor ''s office and other decision makers. If these 
people are only presented with a written report containing masses of data, 
chances are it will never be used (or it will be misused) in making educa- 
tional decisions. This is not to say that there should not be a written 
report. However, the written report should be used as a reference after 
there has been an opportunity for discussion of the results. 

The written report might contain e^ectancy tables; comparison with 
national, state and regional norms; percentage of response to each option 
of key items that reflect the concept being measured and a description 
of the distribution of children in terms of the kinds of problems with 
which they can deal successfully and those with which they cannot. 

Ideally the reports would be interpreted for each school district at a 
local meeting with administrators, teachers and perhaps some parents. If 
this is not possible because of financial or time constraints, the next 
best alternative is to conduct regional meetings with two or three repre- 
sentatives from each district. The district representatives would then 
report to their districts. 



Whatever method is used to report results, part of the report should 
be devoted to a discussion of the Implications and uses of the data. To 
tell districts that their students are performing poorly without having 
staff available to respond to requests for assistance serves no education- 
ally useful purpose. Hopefully » the assessment program has not been con- 
ducted in a vacuum and at this point guidance personnel » curriculum special- 
ists and researchers would be prepared to assist those local districts 
requesting help. 

The operational details of the assessment need to be carefully out- 
lined with a clear delineation of responsibilities and critical dates. 
PERT diagrams are one way of keeping track of the "nitty gritties" of the 
program. There are also computer programs which will provide such things 
as critical dates and manpower needed to keep the assessment activities 
on schedule. 

The last decision to be considered in this paper is when to conduct 
the assesament. The uses of the data and the amount of time needed to 
report the results back to the local school districts are two of the 
parameters that must be considered before determining when the assessment 
is conducted. 

Generally, early fall testing is recommended no that results can be 
reported and studied in time for decision makers » at both the state and 
local levels » to make use of them in budgetary planning. This recommendation 
should not be misinterpreted as Implying that assessment results should be 
used as a basis for rewarding or ptinlshlng local districts. Rather they 
should be used to help determine possible areas for change that may or may 
not require additional funds. For example » if a superintendent discovered 
from the assessment results that two schools in his district were doing 



very well In reading and a third was doing very poorly, he might decide 
to have his reading specialist npend more time at the school in which 
students performed poorly. If he had no reading specialist, he might 
decide to reallocate some of his funds to hire one. At the state level, 
additional funds might be specifically allocated for hiring a reading 
specialist. 

In summary, after a discussion of six principles of assessment, I 
Indicated the decisions and some alternative strategies for planning 
and conducting a statewide assessment program. These decisions include 
goal sett:!ng, establishing priorities, the nuinber of goals to be assessed, 
the target population, sampling procedures, instrumentation, correlates 
of achievement, data analysis, reporting of results and when to conduct 
the assessment. 

The final decisions in assessment must be state specific since it is 
highly unlikely that any two states operate under the same constraints 
with the same parameters. 
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